Experiences with High-Level Programming Directives for Porting Applications to GPUs

نویسندگان

Oscar R. Hernandez

Wei Ding

Barbara M. Chapman

Christos Kartsaklis

Ramanan Sankaran

Richard L. Graham

چکیده

HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU threads in conjunction with an accelerator programming model to share and manage the difference node resources. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. In order to offset the high development cost of creating CUDA or OpenCL kernels, directives have been proposed for programming accelerator devices, but their implications are not well known. In this paper, we evaluate the state of the art accelerator directives to program several applications kernels, explore transformations to achieve good performance, and examine the expressivity and performance penalty of using high-level directives versus CUDA. We also compare our results to OpenMP implementations to understand the benefits of running the kernels in the accelerator versus CPU cores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiler Support for High-level GPU Programming

We design a high-level abstraction of CUDA, called hiCUDA, using compiler directives. It simplifies the tasks in porting sequential applications to NVIDIA GPUs. This paper focuses on the design and implementation of a source-to-source compiler that translates a hiCUDA program into an equivalent CUDA program, and shows that the performance of CUDA code generated by this compiler is comparable to...

متن کامل

Parallelization of NAS Benchmarks for Shared Memory Multiprocessore

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of port...

متن کامل

Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

متن کامل

Trellis: Portability across architectures with a high-level framework

The increasing computational needs of parallel applications inevitably require portability across parallel architectures, which now include heterogeneous processing resources, such as CPUs and GPUs, and multiple SIMD/SIMT widths. However, the lack of a common parallel programming paradigm that provides predictable, near-optimal performance on each resource leads to the use of low-level framewor...

متن کامل

Experiences of Using the OpenMP Accelerator Model to Port DOE Stencil Applications

The Department of Energy has a wide range of large-scale, parallel scientific applications running on cutting-edge high-performance computing systems to support its mission and tackle critical science challenges. A recent trend in these high-performance computing systems is to add commodity accelerators, such as Nvidia GPUs and Intel Xeon Phi coprocessors, into computer nodes so we can achieve ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Experiences with High-Level Programming Directives for Porting Applications to GPUs

نویسندگان

چکیده

منابع مشابه

Compiler Support for High-level GPU Programming

Parallelization of NAS Benchmarks for Shared Memory Multiprocessore

Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

Trellis: Portability across architectures with a high-level framework

Experiences of Using the OpenMP Accelerator Model to Port DOE Stencil Applications

عنوان ژورنال:

اشتراک گذاری